Lexical Knowledge Acquisition from Corpora
نویسندگان
چکیده
The paper presents a computational environment to support developing a lexicon for natural language processing. The underlying idea of the environment is to utilize up-to-date language technologies to minimize both the human labor and the inconsistency that are unavoidable in manual compilation of a lexicon. The proposed computational environment enables an efcient construction of a consistent and fertile lexicon. Among the major components of the environment, this paper focuses on compilation (or acquisition) of subcategorization frame lexicon from parsed corpora. Especially, this paper discusses issues on semi-automatic sense classi cation of polysemous verbs and probabilistic model learning of subcategorization preference.
منابع مشابه
An Application of Lexical Semantics to Knowledge Acquisition from Corpora
In this paper, we describe a program of research designed to explore.' how a lexical semantic theory may be exploited for extracting information from corpora suitable for use in Information Retrieval applications. Unlike with purely statistical collocational analyses, the framework of a semantic theory allows the ~ultomatic construction of predictions about semantic relationships among words ap...
متن کاملCombining NLP and statistical techniques for lexical acquisition
The growing availability of large on-line corpora encourages the study of word behaviour directly from accessible raw texts. However the methods by which lexical knowledge should be extracted from plain texts are still matter of debate and experimentation. In this paper it is presented an integrated tool for lexical acquisition from corpora, ARIOSTO, based on a hybrid methodology that combines ...
متن کاملCorpus-Based Induction of Lexical Representation and Meaning
The acquisition of linguistic knowledge, i.e., the identication, extraction, and encoding of linguistic information in a corpus, has been one of the main motivations for data-driven approaches to natural language. Methods have been developed for the acquisition of, for instance, parts of speech, noun compounds, collocations, support verbs, subcategorization frames, phrase structure rules, selec...
متن کاملLexical Database for Multiple Languages: Multilingual Word Semantic Network
Data mining and knowledge engineering have become a tough task due to the availability of large amount of data in the web nowadays. Validity and reliability of data also become a main debate in knowledge acquisition. Besides, acquiring knowledge from different languages has become another concern. There are many language translators and corpora developed but the function of these translators an...
متن کاملAutomatic lexical acquisition from corpora: some limitations and tentative solutions
This paper deals with lexical acquisition. We take another look at some experiments we have recently carried out on the automatic acquisition of lexical resources from French corpora. We describe the architecture of our system for lexical acquisition. We formulate the hypothesis that some of the limitations of the current system are mainly due to a poor representation of the constraints used. F...
متن کاملIn So Many Words: Knowledge as a Lexical Phenomenon
Lexical knowledge is knowledge that can be expressed in words. Circular though this may seem, we think it provides a perfectly reasonable point of departure, for, in line with a long-standing philosophical tradition it posits communicability as the most characteristic aspect of lexical knowledge. Knowledge representation systems should be designed so as to fit lexical data rather than the other...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007